Skip to content

feat(topic-tree): hierarchical concept topic tree — experimental POC (flag-gated)#149

Draft
KylinMountain wants to merge 14 commits into
mainfrom
feat/concept-topic-tree
Draft

feat(topic-tree): hierarchical concept topic tree — experimental POC (flag-gated)#149
KylinMountain wants to merge 14 commits into
mainfrom
feat/concept-topic-tree

Conversation

@KylinMountain

Copy link
Copy Markdown
Collaborator

Draft / experimental POC. Roadmap #3 ("hierarchical concept indexing for massive KBs"). Everything is behind a topic_tree config flag (default off) — with the flag off, behavior and the full test suite are unchanged. May not merge as-is; opening for review of the direction.

What this does

Turns the flat wiki/concepts/ into a topic tree — internal nodes are topic pages (_topic.md with an LLM-written summary), leaves are concept pages — that compile/query can descend instead of enumerating everything. Same vectorless, reasoning-based idea as PageIndex/ConDB, applied to OpenKB's own concept graph.

How to try it

# in a KB with some compiled concepts:
echo "topic_tree: true" >> .openkb/config.yaml
openkb reindex          # builds the topic tree from the flat concepts
openkb query "..."      # the agent now descends the tree (read_topic)

What's included

  • openkb/topic_tree.py — generic engine: read_topic / place_concept / split_node / top-down bootstrap (global cold-start seed) + place_topic_dir. Pure, unit-tested; LLM decisions are injected callables.
  • openkb/topic_tree_llm.py — LLM-backed choose / cluster / summarize.
  • openkb/lint.py — wikilinks resolve by bare stem and concepts/<stem> recursively, so links survive a concept being moved into a topic dir (Obsidian-style).
  • openkb/agent/{query,tools}.pyread_topic tool + tree-descent instructions (flag-gated).
  • openkb/cli.pyopenkb reindex (flag-gated bootstrap).
  • openkb/agent/compiler.py_write_concept(topic_dir=…) seam.

Status

  • 893 tests pass; flag off → zero regression. New tests cover node IO, placement, split, top-down bootstrap (incl. recursion/depth), name-based + dir-prefixed link survival, the read_topic tool, and a full-stack e2e.
  • Real e2e: 1 ML paper + 6 cross-domain Wikipedia articles → 21 concepts → reindex → 7 clean domain branches, 0 broken links, coherent topic summaries.

Deferred (follow-ups)

  • T9 part 2 — live tree-aware _compile_concepts (so new docs compile sub-linearly without a reindex). Bigger: it cascades into _add_related_link / _backlink_* (flat-path assumptions). For now: compile flat → reindex.
  • Scale buildbootstrap clusters all briefs in one call (fine ≤ a few hundred; needs sampled/hierarchical seeding for 1M+); incremental + local merge/rebalance; bottom-up summary refresh; ConDB as the scale-tier retrieval backend (feat: OpenKB MVP — Karpathy's LLM Knowledge Base, powered by PageIndex #4). Design notes for these are kept local (repo rule: specs out of git).

Note

This branch also carries one unrelated local commit — cdac90d docs: compress architecture diagram to WebP (yours, never pushed). I'll lift it onto main separately so the PR is just the topic-tree commits before this goes out of draft.

https://claude.ai/code/session_018WiFnTo1YW9mtw47Fzir9K

Convert assets/openkb-architecture.png (1.78MB) to WebP at q85
(~195KB, -89%) and update the README reference. Visually lossless for
the diagram's text and line work.
…concepts

Real e2e (reindex over 7 cross-domain docs) surfaced 126 broken links: concept
bodies use the compiler's [[concepts/<stem>]] form, which stopped resolving once
reindex nested the files. Emit a path-independent concepts/<stem> target too.

Claude-Session: https://claude.ai/code/session_018WiFnTo1YW9mtw47Fzir9K
Cluster the full concept set top-down and recurse into oversized groups, instead
of greedy one-by-one placement that froze the high-level taxonomy on the first
~FANOUT_K concepts. Drop choose from bootstrap; place_concept/split_node remain
for the future live-incremental path. Per spec section 12.

Claude-Session: https://claude.ai/code/session_018WiFnTo1YW9mtw47Fzir9K
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant